Affiliations:
  1. Max Planck Institute for Evolutionary Anthropology
  2. University of California Santa Barbara
  3. University of California Los Angeles
  4. Anglia Ruskin University

*Corresponding author:

#Make code wrap text so it doesn't go off the page when Knitting to PDF
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)

Cite as: Logan CJ, McCune K, MacPherson M, Johnson-Ulrich Z, Bergeron L, Rowney C, Seitz B, Blaisdell A, Folsom M, Wascher CAF. 2019. Are the more flexible individuals also better at inhibition? (http://corinalogan.com/Preregistrations/g_inhibition.html) In principle acceptance by PCI Ecology of the version on 6 Mar 2019 https://github.com/corinalogan/grackles/blob/master/Files/Preregistrations/g_inhibition.Rmd.

This preregistration has been pre-study peer reviewed and received an In Principle Recommendation by:

Erin Vogel (2019) Adapting to a changing environment: advancing our understanding of the mechanisms that lead to behavioral flexibility. Peer Community in Ecology, 100016. 10.24072/pci.ecology.100016

  • Reviewers: Simon Gingins and two anonymous reviewers

ABSTRACT

This is one of the first studies planned for our long-term research on the role of behavioral flexibility in rapid geographic range expansions. Project background: Behavioral flexibility, the ability to change behavior when circumstances change based on learning from previous experience (Mikhalevich, Powell, and Logan (2017)), is thought to play an important role in a species’ ability to successfully adapt to new environments and expand its geographic range (e.g., (Lefebvre et al. 1997), (Griffin and Guez 2014), (Chow, Lea, and Leaver 2016), (Sol and Lefebvre 2000), (Sol, Timmermans, and Lefebvre 2002), (Sol et al. 2005)). However, behavioral flexibility is rarely directly tested at the individual level, thus limiting our ability to determine how it relates to other traits, which limits the power of predictions about a species’ ability to adapt behavior to new environments. We use great-tailed grackles (a bird species) as a model to investigate this question because they have rapidly expanded their range into North America over the past 140 years ((Wehtje 2003), (Peer 2011)) (Fig. 1). This investigation: In this piece of the long-term project, we aim to measure grackle inhibition in three experiments (delay of gratification, go-no go, detour) to determine whether those individuals that are more flexible are also better at inhibiting. Results will allow us to determine whether inhibition is linked with measures of flexibility (reversal learning and solution switching), how performance on these inhibition tests relate to each other to determine whether they measure the same or different traits, and validate the use of an inhibition task using a touch screen.

Figure 1. Five-year project overview. The same individuals will experience the experiments listed in each column (i.e., the same ~32 individuals in the left column (Years 1 and 2) will experience numbers 1-9, and the same ~32 individuals in the right column (Years 3-5) will experience A-D, plus numbers 5-9).

INTRODUCTION

RESULTS

Table 1. Summarized results per bird in the go no-go, detour, reversal, and multiaccess box (MAB) experiments.

d <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_inhibition_datasummary.csv"), 
    header = F, sep = ",", stringsAsFactors = F)

d <- data.frame(d)

library(reactable)
reactable(d, highlight = TRUE, bordered = FALSE, compact = TRUE, 
    wrap = TRUE, resizable = TRUE, columns = list(V1 = colDef(name = "Bird name"), 
        V2 = colDef(name = "Bird ID"), V3 = colDef(name = "Go no-go trials to 85% correct after 150 trials"), 
        V4 = colDef(name = "Go no-go trials to 85% correct"), 
        V5 = colDef(name = "Detour % correct"), V6 = colDef(name = "Detour % correct modified"), 
        V7 = colDef(name = "Detour pre- or post-reversal"), V8 = colDef(name = "Trials to reverse in last reversal"), 
        V9 = colDef(name = "Average latency to attempt new solution (MAB)")))

DISCUSSION

A. STATE OF THE DATA

Prior to collecting any data: This preregistration was written.

After data collection had begun (and before any data analysis): This preregistration was submitted to PCI Ecology (Oct 2018) for peer review after starting data collection on the detour task for the pre-reversal subcategory of subjects (for which there was data from one bird). Reviews were received, the preregistration was revised and resubmitted to PCI Ecology (Jan 2019) at which point there was detour data for six birds, data on a few training trials for the delay of gratification task for one bird, and no data from the go no-go experiment. This preregistration passed peer review and was recommended by PCI Ecology in March 2019 (see the review history).

After data collection began and before data analysis: how the actual methods differed from the planned methods

  1. Jan 2020: we discovered that none of the grackles reached 100% accuracy within 150 trials (at least not at the level of 20 trial blocks), which is consistent with their reversal performance as well where they usually do not 100% prefer one option, but continue to occasionally explore the other option. The passing criterion of 100% correct within 150 trials or 85% correct between 150-200 trials could be the reason there was not much individual variation in this test (passing in 160-190 trials or they did not reach 85% accuracy within 200 trials). All grackles received 150+ trials, therefore we were only measuring variation after 150 trials, rather than variation across all trials. We decided to add a post-hoc passing criterion that might be more illustrative of individual differences in inhibition in grackles: 85% accuracy at the level of the most recent sliding 10 trial block (i.e., the most recent 10 trials, regardless of whether it is an even 20, 30, 40 trials). We added this modified response variable posthoc to the discussion. We predict this new passing criterion will show more individual variation, and that it will more accurately represent individual differences in grackle inhibitive abilities.

  2. Jul 2020: Independent variables > P1 go no-go > Model 2b: removed the variable “flexibility condition” because, by definition, the birds in the manipulated condition were faster to reverse.

B. PARTITIONING THE RESULTS

We may decide to present the results from different tests in separate papers.

C. HYPOTHESIS

If flexibility requires inhibition, then individuals that are more behaviorally flexible (indicated by individuals that are faster at functionally changing their behavior when circumstances change), as measured by reversal learning and switching between options on a multi-access box, will also be better at inhibiting their responses in three tasks: delayed gratification, go no-go, and detour.

P1: Individuals that are faster to reverse preferences on a reversal learning task and who also have lower latencies to successfully solve new loci after previously solved loci become unavailable (multi-access box) (see flexibility preregistration) will perform better in the go no-go task (methods similar to Harding, Paul, and Mendl (2004)), in the detour task (methods as in MacLean et al. (2014) who call it the “cylinder task”), and they will wait longer for higher quality (more preferred) food, but not for higher quantities (methods as in Hillemann et al. (2014)). Waiting for higher quality food has been validated as a test of inhibition in birds, while waiting for a higher quantity of food does not appear to measure inhibition (Hillemann et al. (2014)).

P1 alternative 1: If there is no correlation between flexibility measures and performance on the inhibition tasks, this may indicate that the flexibility tasks may not require much inhibition (particularly if the inhibition results are reliable - see P1 alternative 2).

P1 alternative 2: If there is no correlation between flexibility measures and performance on the inhibition tasks, this may indicate that the inhibition tasks had low reliability and were therefore too noisy to correlate with flexibility.

P2: If there is no correlation in performance across inhibition tasks, it may indicate that that one or more of these tasks does not measure inhibition, or that they measure different types of inhibition (see Friedman and Miyake (2004)).

P2 alternative: If go no-go task performance strongly correlates with performance on the delayed gratification task, this indicates these two tasks measure the same trait, which therefore validates an inhibition task using a touch screen (the go no-go task).

P3: If individuals perform well on the detour task and with little individual variation, this is potentially because they will have had extensive experience looking into the sides of opaque tubes during reversal learning. To determine whether prior experience with opaque tubes in reversal learning contributed to their detour performance, a subset of individuals will experience the detour task before any reversal learning tests. If this subset performs the same as the others, then previous experience with tubes does not influence detour task performance. If the subset performs worse than the others, this indicates that detour task performance depends on the previous experiences of the indviduals tested.

Figure 2. Experimental design.

Figure 2. Experimental design.

Figure 2. The experimental designs of the three tasks: delayed gratification, go no-go, and detour (see protocol for details). In the delay of gratification task, individuals learn that food items will be transferred by the experimenter from a storing lid (near the experimenter) to a serving lid (near the bird) one at a time, and that they have access to the food in the serving lid from which they can eat at any time: they will learn that they will have access to more food if they wait longer for the experimenter to transfer food items. Once they pass training (by waiting for more than one food item in three trials), they move on to the test where food items are transferred from the serving to the storing lid with delays ranging from 2-1280 seconds. Birds will be tested on whether they are willing to wait for food items that increase in quality (i.e., are more preferred) or increase in quantity (i.e., the same food type accumulates in the serving lid). In the go no-go task, after pecking a start key on the touch screen to show they are interested in attending to a trial, they will see either a green circle or a purple circle (the rewarded circle color is counterbalanced across birds). Pecking the food key while the rewarded colored circle (green in the figure) is on the screen will result in the food hopper rising so the bird can eat food for 2 seconds, after which point the trial ends and the screen goes blank for 8 seconds before starting over again. If the non-rewarded colored circle (purple in the figure) appears on the screen after the start key is pecked, then the correct response is to refrain from pecking the food key for 10 seconds. If the bird succeeds in refraining, the next intertrial interval starts. If the bird fails and pecks the food key while the purple circle is on the screen, then it is given an aversive stimuli for 5 seconds (TV static screen). In the detour task, individuals first receive a warm up with an opaque tube where they learn that the experimenter will show them a piece of food and then move that piece of food into the tube. They then have the opportunity to approach the tube and eat the food. A correct response is when their first approach is to go to the side of the tube to the opening to obtain the food and an incorrect response is when they try to access the food by pecking at the front of the tube (which has no opening). Once they pass the warm up, they move on to the test, which is exactly the same except the tube is transparent. The idea is that being able to see the food through the tube wall might entice them to try to go through the wall rather than refrain from a direct approach to the food and instead go around the side through the tube opening.

D. METHODS

Open materials

Testing protocols for all three experiments: color tube reversal learning, multi-access box, and touch screen reversal learning

Open data

When the study is complete, the data will be published in the Knowledge Network for Biocomplexity’s data repository.

Randomization and counterbalancing

P3

Two individuals from each batch will experience the detour task before participating in the flexibility manipulation. These individuals will be randomly selected using the random number generator at https://www.random.org.

P1-P2

For the rest of the individuals (n=6 per batch), the order of the three inhibition tasks will be counterbalanced across birds (using https://www.random.org to randomly assign individuals to one of three experimental orders). 1/3 of the individuals will experience:

  1. Delayed gratification task

  2. Go no-go task

  3. Detour

1/3 of the individuals will experience:

  1. Go no-go task

  2. Detour

  3. Delayed gratification task

1/3 of the individuals will experience:

  1. Detour

  2. Delayed gratification task

  3. Go no-go task

Delayed gratification

  • Food preference test: food will be presented in random combinations over six sessions of 12-15 trials.

  • Training trials: The type of demonstration and training trials varied randomly (with more demo trials near the beginning of training), incorporating trials in which food of the same sort accumulated (quantity), food of ascending quality accumulated (quality), and trials in which we added increasingly larger food pieces throughout the trial (size)

  • Test: we will test each food quality (low, mid, high) twice in randomized order in each session.

Go no-go

Go and no-go trials will be presented randomly with the restriction that no more than four of the same type will occur in a row. The rewarded color will be counterbalanced across birds.

Detour

The side from which the apparatus is baited will be consistent within subjects, but counterbalanced across subjects.

Blinding of conditions during analysis

No blinding is involved in this study.

Dependent variables

P1: the more flexible individuals are better at inhibition
  1. Delayed gratification: Number of food pieces waited for (0-3). A successful wait is defined as waiting for at least one additional piece of food to be added to the serving lid of the three possible additional food items, and accepted at least one piece of the reward pieces.

  2. Go no-go:

    1. The number of trials to reach criterion (85% correct) where correct responses involve pecking when the rewarded stimulus is displayed and not pecking when the unrewarded stimulus is displayed, and incorrect responses involve pecking when the unrewarded stimulus is displayed, and not pecking when the rewarded stimulus is displayed

    2. The latency to respond (peck the target key)

  3. Detour: First approach (physical contact): Correct (to the tube’s side opening) or Incorrect (to the front unopen area of the tube) (methods as in MacLean et al. (2014)).

One model will be run per dependent variable.

P3: does training improve detour performance?
  1. First approach (physical contact): Correct (to the tube’s side opening) or Incorrect (to the front unopen area of the tube) (methods as in MacLean et al. (2014)).

Independent variables

P1: delayed gratification
  1. Food quality or quantity (Quality: High, Med, Low; Quantity: Smaller, Medium, Larger)

  2. Trial

  3. Delay (2, 5, 10, 20, 40, 60, or 80 seconds)

  4. Flexibility 1: Number of trials to reverse a preference in the last reversal an individual experienced (reversal learning; an individual is considered to have a preference if it chose the rewarded option at least 17 out of the most recent 20 trials, with a minimum of 8 or 9 correct choices out of 10 on the two most recent sets of 10 trials). See behavioral flexibility preregistration.

  5. Flexibility 3: If the number of trials to reverse a preference does not positively correlate with the latency to attempt or solve new loci on the multi-access box (an additional measure of behavioral flexibility), then the average latency to solve and the average latency to attempt a new option on the multi-access box will be additional dependent variables. See behavioral flexibility preregistration.

  6. Flexibility 4: This measure is currently being developed and is intended be a more accurate representation of all of the choices an individual made, as well as accounting for the degree of uncertainty exhibited by individuals as preferences change. If this measure more effectively represents flexibility (determined using a modeled dataset and not the actual data), we may decide to solely rely on this measure and not use flexibility measures 1 through 3. If this ends up being the case, we will modify the code in the analysis plan below to reflect this change.

P1: go no-go

Model 2a: number of trials to reach criterion

  1. Flexibility 1: Number of trials to reverse a preference in the last reversal an individual experienced (reversal learning; as above)

  2. Flexibility 3: If the number of trials to reverse a preference does not positively correlate with the latency to attempt or solve new loci on the multi-access box, then the average latency to solve and the average latency to attempt a new option on the multi-access box will be additional independent variables (as above).

  3. Flexibility 4: This measure is currently being developed and is intended be a more accurate representation of all of the choices an individual made, as well as accounting for the degree of uncertainty exhibited by individuals as preferences change. If this measure more effectively represents flexibility (determined using a modeled dataset and not the actual data), we may decide to solely rely on this measure and not use flexibility measures 1 through 3. If this ends up being the case, we will modify the code in the analysis plan below to reflect this change.

Model 2b: latency to respond

  1. Correct or incorrect response

  2. Trial

  3. Flexibility Condition: control, flexibility manipulation

  4. ID (random effect because multiple measures per bird)

NOTE Jul 2020: remove flexibility condition as a variable because, by definition, the birds in the manipulated group were faster to reverse their preferences.

P1: detour
  1. Trial

NOTE (Aug 2020): Because the data are analyzed in a GLM, meaning that there is only one row per bird, trial number is not able to be included in such an analysis because it would need to be conducted on multiple rows per bird. Therefore, we removed this independent variable from this analysis.

  1. Flexibility 1: Number of trials to reverse a preference in the last reversal an individual experienced (reversal learning; as above)

  2. Flexibility 3: If the number of trials to reverse a preference does not positively correlate with the latency to attempt or solve new loci on the multi-access box, then the average latency to solve and the average latency to attempt a new option on the multi-access box will be additional independent variables (as above).

  3. Flexibility 4: This measure is currently being developed and is intended be a more accurate representation of all of the choices an individual made, as well as accounting for the degree of uncertainty exhibited by individuals as preferences change. If this measure more effectively represents flexibility (determined using a modeled dataset and not the actual data), we may decide to solely rely on this measure and not use flexibility measures 1 through 3. If this ends up being the case, we will modify the code in the analysis plan below to reflect this change.

P3: does training improve detour performance?
  1. Condition: pre- or post-reversal learning tests
Interobserver reliability of dependent variables

To determine whether experimenters coded the dependent variables in a repeatable way, hypothesis-blind video coders, Sophie Kaube (detour) and Brynna Hood (go no-go), were first trained in video coding the dependent variables (detour and go no-go: whether the bird made the correct choice or not), requiring a Cohen’s unweighted kappa of 0.90 or above to pass training (using the psych package in R Revelle (2014)). This threshold indicates that the two coders (the experimenter and the video coder) agree with each other to a high degree (CITE). After passing training, the video coders coded 24% (detour) and 33% (go no-go) of the videos for each experiment.

Detour: correct choice

We randomly chose four (Diablo, Queso, Chalupa, and Habanero) of the 11 birds that had participated in this experiment by Nov 2019 using random.org. First, Kaube analyzed all videos from Habanero and Diablo, and we analyzed the data using an intraclass correlation coefficient, which is not an appropriate test for categorical data. After learning this, we switched to using the Cohen’s unweighted kappa and replaced Habanero and Diablo with two new randomly chosen grackles (Mole and Chilaquile). Kaube then analyzed all videos from Queso and Chalupa for training and passed (Cohen’s unweighted kappa=0.91, confidence boundary=0.75-1.00, n=24 data points). After passing training, Kaube analyzed all videos from Queso, Chalupa, Mole, and Chilaquile, and highly agreed with the experimenter’s data (Cohen’s unweighted kappa=0.91, confidence boundary=0.78-1.00, n=44 data points).

Go no-go: correct choice

We randomly chose three (Diablo, Burrito, and Chilaquile) of the 12 birds that were estimated to complete this experiment using random.org. Hood then analyzed all videos from Diablo for training and passed (Cohen’s unweighted kappa=0.91, confidence boundary=0.80-1.00, n=40 data points). Hood then coded the rest of the videos and had substantial amounts of agreement with the experimenters (Cohen’s unweighted kappa = 0.82, confidence boundary = 0.78-0.85, n=611 data points).

We think the reason for the lower interobserver agreement for this variable is due to the fact that the correct choice data were not as objective to code as we had hoped due to the touchscreen malfunctioning (not registering touches to the screen), and to the subjective criterion that the bird had to be within a certain distance of the screen to be considered paying attention and thus be in position to make a choice or not. This indicates that our touchscreen set up could be greatly improved such that it is actually automated, rather than needing experimenter intervention for every trial.

Go no-go: latency to respond (peck the screen)

Interobserver reliability was not conducted on this variable because we obtained this data from the automatically generated PsychoPy data sheets. However, we must note that when entering the latency to first screen peck into the main data sheet that the experimenter used to determine whether they made a correct choice or not, the two data sheets did not always match. This is because: 1) if a session started or ended with the bird not participating such that a trial was not triggered, this receives a -1 in the experimenter’s data sheet and is not recorded by the PsychoPy data sheet; and 2) the touchscreen regularly failed to register screen pecks, which could result in an NA for the PsychoPy data sheet whereas the experimenter’s data sheet recorded a choice.

library(irr)  #ICC package

#### DETOUR

# did Sophie Kaube pass interobserver reliability training?
# YES (0.91) on 10 Feb 2020

## 9 Jan 2020 = ICC = 0.4 because we differed on 2/21 data
## points. However, Sophie was correct (CL checked the video)
## because she followed the instructions in an unbiased way. I
## need to rethink the instructions and then give her another
## video or two to code
ld <- c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 
    1, 1, 1)  #live coder data from AdaptedFirstTouchCorrectChoice column for A036R- 2019-01-07 Detour S1 T1, A036R- 2019-01-08 Detour S2 T3, A064LR 2019-11-07 Detour S1 T1
vd <- c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 2, 2, 
    1, 1, 1)  #video coder data for same videos
vdtest <- c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 
    1, 1, 1, 1, 1)  #video coder data for same videos 
d <- data.frame(ld, vdtest)

icc(d, model = "oneway", type = "consistency", unit = "single", 
    conf.level = 0.95)  #=0.412 BUT WAIT!!!! The reason the ICC was so low is because it was looking at the distance between the measurements: 0-2 and 0-2 were the differences, but in reality, these are just categories that don't have a distance. So change the TYPE to CONSISTENCY rather than AGREEMENT (http://neoacademic.com/2011/11/16/computing-intraclass-correlations-icc-as-estimates-of-interrater-reliability-in-spss/). BUT the instructions for ICC say that if a oneway model is chosen, then consistency is the only option so it is implicitly already doing consistency.

## 10 Feb 2020: Did Sophie pass IOR for the second 2 birds she
## coded? YES 0.91 NOTE: don't just add more videos to the
## above analysis to try to get her ICC up because it would
## take more than 4 extra videos to counteract the 2
## disagreements. Instead, start from scratch on different
## videos and have her recode the above after she passes.
ld2 <- c(1, 1, 1, -1, 1, 2, -1, 1, 1, 1, 1, -1, 2, 1, 1, 1, 0, 
    1, 1, 1, 1, 1, -1, 1)  #live coder data from AdaptedFirstTouchCorrectChoice column for the same videos as vd2
vd2 <- c(0, 1, 1, -1, 1, 2, -1, 1, 1, 1, 1, -1, 2, 1, 1, 1, 0, 
    1, 1, 1, 1, 1, -1, 1)  #video coder data from AdaptedFirstTouchCorrectChoice for videos A025GO 2018-12-30 Detour Up S1 T1, A025GO 2018-12-31 Detour S2 T4, A025GO 2018-12-31 Detour S3 T6, A025GO 2018-12-31 Detour S4 T9, A025GO 2019-01-01 Detour S5 T10, A031-Y 2018-10-08 Detour S1 T1, A031-Y 2018-10-09 Detour S2 T10
d5 <- data.frame(ld2 + 2, vd2 + 2)

cohen.kappa(d5, w = NULL, n.obs = NULL, alpha = 0.05, levels = NULL)  #Sophie = unweighted kappa = 0.91, confidence boundary=0.75-1.00, n=24 data points
# Now have Sophie code 2 additional birds (don't need any
# more IOR because she already passed) and then 20% of the
# videos will be double coded

## 19 Feb 2020: Sophie IOR score for 20% of the videos (birds
## 25 Chalupa, 31 Queso, 35 Mole, 86 Chilaquile) = 0.91
ld3 <- c(1, 1, 1, -1, 1, 2, -1, 1, 1, 1, 1, -1, 2, 1, 1, 1, 0, 
    1, 1, 1, 1, 1, -1, 1, 0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 
    2, 1, 2, 1, 1, 1, 1, 2)  #live coder data from AdaptedFirstTouchCorrectChoice column for the same videos as vd2
vd3 <- c(0, 1, 1, -1, 1, 2, -1, 1, 1, 1, 1, -1, 2, 1, 1, 1, 0, 
    1, 1, 1, 1, 1, -1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 
    2, 1, 2, 1, 1, 1, 1, 2)  #video coder data from AdaptedFirstTouchCorrectChoice for videos A025GO 2018-12-30 Detour Up S1 T1, A025GO 2018-12-31 Detour S2 T4, A025GO 2018-12-31 Detour S3 T6, A025GO 2018-12-31 Detour S4 T9, A025GO 2019-01-01 Detour S5 T10, A031-Y 2018-10-08 Detour S1 T1, A031-Y 2018-10-09 Detour S2 T10, A035P- 2018-12-20 Detour S1 T1, A086GB 2020-01-01 Detour S1 T1
dfinal <- data.frame(ld3 + 2, vd3 + 2)
cohen.kappa(dfinal, w = NULL, n.obs = NULL, alpha = 0.05, levels = NULL)  #Sophie = unweighted kappa = 0.91, confidence boundary=0.78-1.00, n=44 data points


#### GO NO GO

# did Brynna Hood pass interobserver reliability training?
# YES Cohen's unweighted kappa = 0.91
mm <- c(-1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, -1, 1, 0, 1, 
    0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 
    0, 1, 1, 0)  #live coder data (Maggie MacPherson) from CorectChoice column for the same trials as in videos for bh
bh <- c(-1, 2, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, -1, 1, 0, 1, 
    0, 1, 0, 1, 2, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 
    0, 1, 1, 0)  #video coder data from CorrectChoice for videos A064LR 2019-12-06 Go:No-Go S1 T1.mp4, A064LR 2019-12-06 Go:No-Go S2 T19.mp4
dgng <- data.frame(mm, bh)
cohen.kappa(dgng, w = NULL, n.obs = NULL, alpha = 0.05, levels = NULL)  #unweighted kappa = 0.91, confidence boundaries 0.80-1.00, n=40 data points

# interobserver reliability on 20% of the videos between
# Brynna Hood and the live coder (LC)
data <- read.csv("/Users/corina/ownCloud/Documents/Experiments/Interobserver Reliability/InterObsRelGoNoGoLCBrynna.csv", 
    header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(data)  #Check to make sure it looks right
# Note: c(3,5) is telling R to look at columns 2
# ('1CorrectChoice') and 3 ('2CorrectChoiceAdapted') and
# compare them. Double check this:
data[, 3]  #1CorrectChoice = live coder
data[, 5]  #2CorrectChoiceAdapted = re-aligned Coder2 data from Brynna
cohen.kappa(data[, c(3, 5)], w = NULL, n.obs = NULL, alpha = 0.05, 
    levels = NULL)  #unweighted kappa = 0.82, confidence boundary = 0.78-0.85, n=611 data points

E. ANALYSIS PLAN

We do not plan to exclude any data. When missing data occur, the existing data for that individual will be included in the analyses for the tests they completed. Analyses will be conducted in R (current version 3.6.3; R Core Team (2017)). When there is more than one experimenter within a test, experimenter will be added as a random effect to account for potential differences between experimenters in conducting the tests. If there are no differences between models including or excluding experimenter as a random effect, then we will use the model without this random effect for simplicity.

Ability to detect actual effects

To begin to understand what kinds of effect sizes we will be able to detect given our sample size limitations and our interest in decreasing noise by attempting to measure it, which increases the number of explanatory variables, we used G*Power (v.3.1, Faul et al. (2007), Faul et al. (2009)) to conduct power analyses based on confidence intervals. G*Power uses pre-set drop down menus and we chose the options that were as close to our analysis methods as possible (listed in each analysis below). Note that there were no explicit options for GLMs (though the chosen test in G*Power appears to align with GLMs) or GLMMs or for the inclusion of the number of trials per bird (which are generally large in our investigation), thus the power analyses are only an approximation of the kinds of effect sizes we can detect. We realize that these power analyses are not fully aligned with our study design and that these kinds of analyses are not appropriate for Bayesian statistics (e.g., our MCMCglmm below), however we are unaware of better options at this time. Additionally, it is difficult to run power analyses because it is unclear what kinds of effect sizes we should expect due to the lack of data on this species for these experiments.

Data checking

The data will be visually checked to determine whether they are normally distributed via two methods: 1) normality is indicated when the histograms of actual data match those with simulated data (Figure 2), and 2) normality is indicated when the residuals closely fit the dotted line in the Normal Q-Q plot (Figure 3) (Zuur, Ieno, and Saveliev 2009). If the data do not appear normally distributed, visually check the residuals. If they are patternless, then assume a normal distribution (Zuur, Ieno, and Saveliev 2009).

d <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_inhibition_datasummary.csv"), 
    header = F, sep = ",", stringsAsFactors = F)
d <- data.frame(d)
colnames(d) <- c("Bird", "ID", "Gonogo150", "Gonogo85", "Detour", 
    "DetourA", "Detourprepost", "TrialsLast", "AvgLatency")

# Check the dependent variables for normality: Histograms
op <- par(mfrow = c(2, 4), mar = c(4, 4, 2, 0.2))
# This is what the distribution of actual data looks like
hist(d$Gonogo150, xlab = "Go no-go: Trials to criterion", main = "Actual Data")  #not normally distributed
hist(d$Gonogo85, xlab = "Go no-go: Trials to criterion", main = "Actual Data")  #much more normally distributed than Gonogo150
hist(d$Detour, xlab = "Detour: First approach", main = "Actual Data")

# Given the actual data, this is what a normal distribution
# would look like
Y2 <- rnorm(1281, mean = mean(d$Gonogo150), sd = sd(d$Gonogo150))
hist(Y2, xlab = "Go/no-go: Trials to criterion", main = "Simulated Data")  #The NAs break this, but looking at the histogram, the data are not normally distributed

Z2 <- rnorm(1281, mean = mean(d$Detour), sd = sd(d$Detour))
hist(Z2, xlab = "Detour: First approach", main = "Simulated Data")


# Check the dependent variables for normality: Q-Q plots
op <- par(mfrow = c(3, 4), mar = c(4, 4, 2, 0.2))
plot(glm(d$Gonogo150 ~ d$TrialsLast))
plot(glm(d$Gonogo85 ~ d$TrialsLast))
plot(glm(d$Detour ~ d$TrialsLast))
# detour looks normally distributed, go no go 150 data are
# borderline, go no go 85 data are normally distributed.
# Analyze all three dependent variables.

## Check the dependent variables for normality: Residuals
op <- par(mfrow = c(1, 3), mar = c(4, 4, 2, 0.2))
plot(residuals(glm(d$Detour ~ d$TrialsLast)), ylab = "Detour residuals: First approach ~ Trials to reverse")
plot(residuals(glm(d$Gonogo150 ~ d$TrialsLast)), ylab = "Go/no-go 150: Residuals Correct response ~ Trials to reverse")
plot(residuals(glm(d$Gonogo85 ~ d$TrialsLast)), ylab = "Go/no-go 85: Residuals Correct response ~ Trials to reverse")
# All look patternless. Analyze all three dependent
# variables.

P1: delayed gratification

Assess food preferences: Conduct preference tests between pairs of different foods. Rank food preferences into three categories (High, Medium, Low) in the order of the percentage of times a food was chosen.

Analysis: Generalized Linear Model (GLM; glm function, stats package) with a Poisson distribution and log link, unless the only choices made were 0 (they didn’t wait for food) and 1 (they waited for 1 piece of food but not for 2 or 3), in which case we will use a binomial distribution with a logit link. We will determine whether an independent variable had an effect or not using the Estimate in the full model.

To determine our ability to detect actual effects, we ran a power analysis in G*Power with the following settings: test family=F tests, statistical test=linear multiple regression: Fixed model (R^2 deviation from zero), type of power analysis=a priori, alpha error probability=0.05. We reduced the power to 0.70 and increased the effect size until the total sample size in the output matched our projected sample size (n=32). The protocol of the power analysis is here:

Input:

Effect size f² = 0,41

α err prob = 0,05

Power (1-β err prob) = 0,7

Number of predictors = 5

Output:

Noncentrality parameter λ = 13,1200000

Critical F = 2,5867901

Numerator df = 5

Denominator df = 26

Total sample size = 32

Actual power = 0,7103096

This means that, with our sample size of 32, we have a 71% chance of detecting a large effect (approximated at f2=0.35 by Cohen (1988)).

acc <- read.csv("/Users/corina/GTGR/data/data_accumulation.csv", 
    header = T, sep = ",", stringsAsFactors = F)

# GLM
better <- glm(NumberOfAccumulationsWaited ~ Delay + FoodQualityQuantity + 
    Trial + TrialsToReverseLast, family = "poisson", data = acc)
# summary(better)

better1 <- summary(better)
library(xtable)
better1.table <- xtable(better1)
library(knitr)
kable(better1.table, caption = "Table U: Model selection output.", 
    format = "html", digits = 2)

P1: go no-go

Analysis:

Model 2a: Generalized Linear Model (GLM; glm function, stats package) with a Poisson distribution and a log link. We will determine whether an independent variable had an effect or not using the Estimate in the full model.

To determine our ability to detect actual effects, we ran a power analysis in G*Power with the following settings: test family=F tests, statistical test=linear multiple regression: Fixed model (R^2 deviation from zero), type of power analysis=a priori, alpha error probability=0.05. We reduced the power to 0.70 and increased the effect size until the total sample size in the output matched our projected sample size (n=32). The protocol of the power analysis is here:

Input:

Effect size f² = 0,27

α err prob = 0,05

Power (1-β err prob) = 0,7

Number of predictors = 2

Output:

Noncentrality parameter λ = 8,6400000

Critical F = 3,3276545

Numerator df = 2

Denominator df = 29

Total sample size = 32

Actual power = 0,7047420

This means that, with our sample size of 32, we have a 70% chance of detecting a medium (approximated at f2=0.15 by Cohen (1988)) to large effect (approximated at f2=0.35 by Cohen (1988)).

d <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_inhibition_datasummary.csv"), 
    header = F, sep = ",", stringsAsFactors = F)
d <- data.frame(d)
colnames(d) <- c("Bird", "ID", "Gonogo150", "Gonogo85", "Detour", 
    "DetourA", "Detourprepost", "TrialsLast", "AvgLatency")

# GLM
m1 <- glm(Gonogo150 ~ TrialsLast, family = "poisson", data = d)
sm1 <- summary(m1)  #there is no relationship between the number of trials to pass criterion in go no go and the number of trials to reverse a preference (in the last reversal)
library(xtable)
sm1.table <- xtable(sm1)
library(knitr)
kable(sm1.table, caption = "Table T: Model selection output.", 
    format = "html", digits = 2)

m2 <- glm(Gonogo85 ~ TrialsLast, family = "poisson", data = d)
sm2 <- summary(m2)  #there is no relationship between the number of trials to pass criterion in go no go and the number of trials to reverse a preference (in the last reversal) (though the estimate 0.005 is significant: p<2e-16)

Model 2b: A Generalized Linear Mixed Model (GLMM; MCMCglmm function, MCMCglmm package; (Hadfield 2010)) will be used with a Poisson distribution and log link using 13,000 iterations with a thinning interval of 10, a burnin of 3,000, and minimal priors (V=1, nu=0) (Hadfield 2014). We will ensure the GLMM shows acceptable convergence (lag time autocorrelation values <0.01; (Hadfield 2010)), and adjust parameters if necessary. We will determine whether an independent variable had an effect or not using the Estimate in the full model.

To roughly estimate our ability to detect actual effects (because these power analyses are designed for frequentist statistics, not Bayesian statistics), we ran a power analysis in G*Power with the following settings: test family=F tests, statistical test=linear multiple regression: Fixed model (R^2 deviation from zero), type of power analysis=a priori, alpha error probability=0.05. We reduced the power to 0.70 and increased the effect size until the total sample size in the output matched our projected sample size (n=32). The number of predictor variables was restricted to only the fixed effects because this test was not designed for mixed models. The protocol of the power analysis is here:

Input:

Effect size f² = 0,32

α err prob = 0,05

Power (1-β err prob) = 0,7

Number of predictors = 3

Output:

Noncentrality parameter λ = 10,2400000

Critical F = 2,9466853

Numerator df = 3

Denominator df = 28

Total sample size = 32

Actual power = 0,7061592

This means that, with our sample size of 32, we have a 71% chance of detecting a large effect (approximated at f2=0.35 by Cohen (1988)).

go <- read.csv("/Users/corina/GTGR/data/data_golatency.csv", 
    header = T, sep = ",", stringsAsFactors = F)

# GLM
library(MCMCglmm)
prior = list(R = list(R1 = list(V = 1, nu = 0), R2 = list(V = 1, 
    nu = 0), R3 = list(V = 1, nu = 0)), G = list(G1 = list(V = 1, 
    nu = 0)))

golat <- MCMCglmm(LatencyToRespond ~ CorrectResponse * Trial * 
    FlexibilityCondition, random = ~ID, family = "poisson", data = go, 
    verbose = F, prior = prior, nitt = 13000, thin = 10, burnin = 3000)
summary(golat)
autocorr(golat$Sol)  #Did fixed effects converge?
autocorr(golat$VCV)  #Did random effects converge?

P1: detour

Analysis: Generalized Linear Model (GLM; glm function, stats package) with a binomial distribution and a logit link. We will determine whether an independent variable had an effect or not using the Estimate in the full model.

See the protocol for the power analyses for Model 2b above for the rough estimation our ability to detect actual effects with this model.

d <- read.csv(url("https://raw.githubusercontent.com/corinalogan/grackles/master/Files/Preregistrations/g_inhibition_datasummary.csv"), 
    header = F, sep = ",", stringsAsFactors = F)
d <- data.frame(d)
colnames(d) <- c("Bird", "ID", "Gonogo150", "Gonogo85", "Detour", 
    "DetourA", "Detourprepost", "TrialsLast", "AvgLatency")

# GLM
d$ID <- factor(d$ID)
m3 <- glm(Detour ~ TrialsLast, family = "binomial", data = d)
sm3 <- summary(m3)  #there is no relationship between the proportion of trials correct on the detour task and the number of trials to reverse in the last reversal
library(xtable)
sm3.table <- xtable(sm3)
library(knitr)
kable(sm3.table, caption = "Table T: Model selection output.", 
    format = "html", digits = 2)

P1 alternative 2: are inhibition results reliable?

The reliability of the inhibition tests will be calculated using Cronbach’s Alpha (as in Friedman and Miyake (2004); R package: psych (Revelle 2017), function: alpha), which is indicated by std.alpha in the output.

rel <- read.csv("/Users/corina/GTGR/data/data_inhibition.csv", 
    header = T, sep = ",", stringsAsFactors = F)

library(psych)
reliab <- alpha(rel, check.keys = TRUE)  #Check.keys automatically reverses the coding for variables that are negatively correlated with the total scale. How to interpret: http://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/ 'If all of the scale items are entirely independent from one another (i.e., are not correlated or share no covariance), then alpha = 0; and, if all of the items have high covariances, then alpha will approach 1 as the number of items in the scale approaches infinity. In other words, the higher the alpha coefficient, the more the items have shared covariance and probably measure the same underlying concept.'
summary(reliab)
# Insert into text: `r reliab$std.alpha`

When comparing all three tests, alpha= .

P2: correlation across inhibition tasks

See analysis description for P1 alternative 2.

rel2 <- read.csv("/Users/corina/GTGR/data/data_inhibition2.csv", 
    header = T, sep = ",", stringsAsFactors = F)

library(psych)
reliab2 <- alpha(rel2, check.keys = TRUE)
summary(reliab2)
# Insert into text: `r reliab2$std.alpha`

When analyzing only the delayed gratification and go no-go tasks, the reliability is alpha= fill in result when data are available.

P3: does training improve detour performance?

Analysis: Generalized Linear Model (GLM; glm function, stats package) with a binomial distribution and a logit link. We will determine whether an independent variable had an effect or not using the Estimate in the full model.

To determine our ability to detect actual effects, we ran a power analysis in G*Power with the following settings: test family=F tests, statistical test=linear multiple regression: Fixed model (R^2 deviation from zero), type of power analysis=a priori, alpha error probability=0.05. We reduced the power to 0.70 and increased the effect size until the total sample size in the output matched our projected sample size (n=32). The protocol of the power analysis is here:

Input:

Effect size f² = 0,21

α err prob = 0,05

Power (1-β err prob) = 0,7

Number of predictors = 1

Output:

Noncentrality parameter λ = 6,7200000

Critical F = 4,1708768

Numerator df = 1

Denominator df = 30

Total sample size = 32

Actual power = 0,7083763

This means that, with our sample size of 32, we have a 71% chance of detecting a medium effect (approximated at f2=0.15 by Cohen (1988)).

detour <- read.csv("/Users/corina/GTGR/data/data_detour.csv", 
    header = T, sep = ",", stringsAsFactors = F)

# GLM
de <- glm(FirstApproach ~ Condition, family = "binomial", data = detour)
sde <- summary(de)
library(xtable)
sde.table <- xtable(sde)
library(knitr)
kable(sde.table, caption = "Table T: Model selection output.", 
    format = "html", digits = 2)

Alternative Analyses

We anticipate that we will want to run additional/different analyses after reading McElreath (2016). We will revise this preregistration to include these new analyses before conducting the analyses above.

F. PLANNED SAMPLE

Great-tailed grackles are caught in the wild in Tempe, Arizona USA for individual identification (colored leg bands in unique combinations). Some individuals (~32) are brought temporarily into aviaries for testing, and then they will be released back to the wild. Grackles are individually housed in an aviary (each 244cm long by 122cm wide by 213cm tall) at Arizona State University for a maximum of three months where they have ad lib access to water at all times and are fed Mazuri Small Bird maintenance diet ad lib during non-testing hours (minimum 20h per day), and various other food items (e.g., peanuts, grapes, bread) during testing (up to 3h per day per bird). Individuals are given three to four days to habituate to the aviaries and then their test battery begins on the fourth or fifth day (birds are usually tested six days per week, therefore if their fourth day in the aviaries occurs on a day off, then they are tested on the fifth day instead).

Sample size rationale

We will test as many birds as we can in the approximately three years at this field site given that the birds only participate in tests in aviaries during the non-breeding season (approximately September through March). The minimum sample size will be 16, however we expect to be able to test up to 32 grackles.

Data collection stopping rule

We will stop testing birds once we have completed two full aviary seasons (likely in March 2020).

G. ETHICS

This research is carried out in accordance with permits from the:

  1. US Fish and Wildlife Service (scientific collecting permit number MB76700A-0,1,2)
  2. US Geological Survey Bird Banding Laboratory (federal bird banding permit number 23872)
  3. Arizona Game and Fish Department (scientific collecting license number SP594338 [2017], SP606267 [2018], and SP639866 [2019])
  4. Institutional Animal Care and Use Committee at Arizona State University (protocol number 17-1594R)
  5. University of Cambridge ethical review process (non-regulated use of animals in scientific procedures: zoo4/17)

H. AUTHOR CONTRIBUTIONS

Logan: Hypothesis development, experimental design (go no-go task), data collection, data analysis and interpretation, write up, revising/editing, materials/funding.

McCune: Data collection, data interpretation, revising/editing.

MacPherson: Data collection, data interpretation, revising/editing.

Johnson-Ulrich: Touchscreen programming for go no-go task, data interpretation, revising/editing.

Bergeron: Data collection, data interpretation, revising/editing.

Rowney: Data collection, data interpretation, revising/editing.

Seitz: Experimental design (go no-go task), touchscreen programming (go no-go task), data interpretation, revising/editing.

Blaisdell: Experimental design (go no-go task), data interpretation, revising/editing.

Folsom: Data collection, data interpretation, revising/editing.

Wascher: Hypothesis development, experimental design (delayed gratification and detour tasks), data analysis and interpretation, write up, revising/editing.

I. FUNDING

This research is funded by the Department of Human Behavior, Ecology and Culture at the Max Planck Institute for Evolutionary Anthropology, and by a Leverhulme Early Career Research Fellowship to Logan in 2017-2018.

J. CONFLICT OF INTEREST DISCLOSURE

We, the authors, declare that we have no financial conflicts of interest with the content of this article. Corina Logan is a Recommender and on the Managing Board at PCI Ecology.

K. ACKNOWLEDGEMENTS

We thank Dieter Lukas for help polishing the predictions; Ben Trumble for providing us with a wet lab at Arizona State University and Angela Bond for lab support; Melissa Wilson Sayres for sponsoring our affiliations at Arizona State University and lending lab equipment; Kevin Langergraber for serving as local PI on the ASU IACUC; Kristine Johnson for technical advice on great-tailed grackles; Arizona State University School of Life Sciences Department Animal Care and Technologies for providing space for our aviaries and for their excellent support of our daily activities; Julia Cissewski for tirelessly solving problems involving financial transactions and contracts; Richard McElreath for project support; Erin Vogel, our Recommender at PCI Ecology, and Simon Gingins and two anonymous reviewers for their wonderful feedback; Debbie Kelly for advice on how to modify the go no-go experiment; and our research assistants: Aelin Mayer, Nancy Rodriguez, Brianna Thomas, Aldora Messinger, Elysia Mamola, Michael Guillen, Rita Barakat, Adriana Boderash, Olateju Ojekunle, August Sevchik, Justin Huynh, Jennifer Berens, and Amanda Overholt.

L. REFERENCES

Chow, Pizza Ka Yee, Stephen EG Lea, and Lisa A Leaver. 2016. “How Practice Makes Perfect: The Role of Persistence, Flexibility and Learning in Problem-Solving Efficiency.” Animal Behaviour 112. Elsevier: 273–83.

Cohen, Jacob. 1988. “Statistical Power Analysis for the Behavioral Sciences 2nd Edn.” Erlbaum Associates, Hillsdale.

Faul, Franz, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. “Statistical Power Analyses Using G* Power 3.1: Tests for Correlation and Regression Analyses.” Behavior Research Methods 41 (4). Springer: 1149–60.

Faul, Franz, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. “G* Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences.” Behavior Research Methods 39 (2). Springer: 175–91.

Friedman, Naomi P, and Akira Miyake. 2004. “The Relations Among Inhibition and Interference Control Functions: A Latent-Variable Analysis.” Journal of Experimental Psychology: General 133 (1). American Psychological Association: 101.

Griffin, Andrea S, and David Guez. 2014. “Innovation and Problem Solving: A Review of Common Mechanisms.” Behavioural Processes 109. Elsevier: 121–34.

Hadfield, Jarrod D. 2010. “MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package.” Journal of Statistical Software 33 (2): 1–22. http://www.jstatsoft.org/v33/i02/.

Hadfield, JD. 2014. “MCMCglmm Course Notes.” http://cran.r-project.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf.

Harding, Emma J, Elizabeth S Paul, and Michael Mendl. 2004. “Animal Behaviour: Cognitive Bias and Affective State.” Nature 427 (6972). Nature Publishing Group: 312–12.

Hillemann, Friederike, Thomas Bugnyar, Kurt Kotrschal, and Claudia AF Wascher. 2014. “Waiting for Better, Not for More: Corvids Respond to Quality in Two Delay Maintenance Tasks.” Animal Behaviour 90. Elsevier: 1–10.

Lefebvre, Louis, Patrick Whittle, Evan Lascaris, and Adam Finkelstein. 1997. “Feeding Innovations and Forebrain Size in Birds.” Animal Behaviour 53 (3). Elsevier: 549–60.

MacLean, Evan L, Brian Hare, Charles L Nunn, Elsa Addessi, Federica Amici, Rindy C Anderson, Filippo Aureli, et al. 2014. “The Evolution of Self-Control.” Proceedings of the National Academy of Sciences 111 (20). National Acad Sciences: E2140–E2148.

McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press. http://xcelab.net/rm/statistical-rethinking/.

Mikhalevich, Irina, Russell Powell, and Corina Logan. 2017. “Is Behavioural Flexibility Evidence of Cognitive Complexity? How Evolution Can Inform Comparative Cognition.” Interface Focus 7 (3): 20160121. https://doi.org/10.1098/rsfs.2016.0121.

Peer, Brian D. 2011. “Invasion of the Emperor’s Grackle.” Ardeola 58 (2). BioOne: 405–9.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org.

Revelle, William. 2014. “Psych: Procedures for Psychological, Psychometric, and Personality Research.” Northwestern University, Evanston, Illinois 165: 1–10.

———. 2017. Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych.

Sol, Daniel, Richard P Duncan, Tim M Blackburn, Phillip Cassey, and Louis Lefebvre. 2005. “Big Brains, Enhanced Cognition, and Response of Birds to Novel Environments.” Proceedings of the National Academy of Sciences of the United States of America 102 (15). National Acad Sciences: 5460–5.

Sol, Daniel, and Louis Lefebvre. 2000. “Behavioural Flexibility Predicts Invasion Success in Birds Introduced to New Zealand.” Oikos 90 (3). Wiley Online Library: 599–605.

Sol, Daniel, Sarah Timmermans, and Louis Lefebvre. 2002. “Behavioural Flexibility and Invasion Success in Birds.” Animal Behaviour 63 (3). Elsevier: 495–502.

Wehtje, Walter. 2003. “The Range Expansion of the Great-Tailed Grackle (Quiscalus Mexicanus Gmelin) in North America Since 1880.” Journal of Biogeography 30 (10). Wiley Online Library: 1593–1607.

Zuur, Alain F.., Elena N.. Ieno, and Anatoly A Saveliev. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer.